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Abstract —This letter considers the problem of signal denoising 
nslng a sparse tight-frame analysis prior. The norm has been 
extensively nsed as a regnlarlzer to promote sparsity; however, it 
tends to under-estimate non-zero values of the underlying signal. 
To more accnrately estimate non-zero valnes, we propose the nse 
of a non-convex regularizer, chosen so as to ensnre convexity of 
the objective fnnction. The convexity of the objective function is 
ensnred by constraining the parameter of the non-convex penalty. 
We use ADMM to obtain a solution and show how to gnarantee 
that ADMM converges to the global optimum of the objective 
function. We illustrate the proposed method for ID and 2D signal 
denoising. 

I. Introduction 

A standard technique for estimating sparse signals is 
through the formulation of an inverse problem with the 
norm as convex proxy for sparsity. In particular, consider 
the problem of estimating a signal x G K” from a noisy 
observation y G K", 

y = x + w, (1) 

where w represents AWGN. We assume the underlying signal 
to be sparse with respect to an overcomplete tight frame A G 
m ^ n, which satisfies the tight frame condition, i.e., 

A'^A = rl, r> 0. (2) 


estimated using suitable non-convex regularizers. Non-convex 
regularization in an analysis model has been used for MRl 
reconstruction [9], EEG signal reconstruction [25], and for 
computer vision problems [29]. However, the use of non- 
convex regularizers comes at a price: the objective function 
is generally non-convex. Consequently, several issues arise 
(spurious local minima, a perturbation of the input data can 
change the solution unpredictably, convergence is guaranteed 
to the local minima only, etc.). 

In order to maintain convexity of the objective function 
while using non-convex regularizers, we propose to restrict the 
parameter ai of the non-convex regularizer (j). By controlling 
the degree of non-convexity of the regularizer we guarantee 
that the total objective function F is convex. This idea which 
dates to Blake and Zisserman [3] and Nikolova [26], has been 
applied to image restoration and reconstruction [27], [28], total 
variation denoising [22], [33], and wavelet denoising [14]. 

In this letter we provide a critical value of parameter a 
to ensure F in (3) is strictly convex (even though (j) is non- 
convex). In contrast to the above works, we consider transform 
domain regularization and prove that ADMM [5] applied 
to the problem (3) converges to the global optimum. The 
convergence of ADMM is guaranteed, provided the augmented 
Lagrangian parameter y, satisfies y > 1/r. 


Using an analysis-prior, we formulate the signal denoising 
problem as 

argmm|F(a:) := ^\\y - x\\l+ {[Ax]^] ai)^, (3) 

where Ai > 0 are the regularization parameters, and </): R —> 
R is a non-smooth sparsity inducing penalty function. The 
parameters Oi control the non-convexity of (j) in case it is non- 
convex. The analysis prior is used in image processing and 
computer vision applications [6], [7], [16], [30], [32], [36], 
[38]. Commonly, the norm is used to induce sparsity, i.e., 
(j){x) = |a;| [10], [35]. In that case, problem (3) is strictly 
convex and the global optimum can be reliably obtained. 

The norm is not the tightest envelope of sparsity [21]. 
It under-estimates the non-zero values of the underlying 
signal [8], [26]. Non-zero values can be more accurately 
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II. Sparse signal estimation 


A. Non-convex Penalty Functions 

In order to induce sparsity more strongly than the £i norm, 
we use non-convex penalty functions 0: R —R parameterized 
by the parameter a ^ 0. We make the following assumption 
of such penalty functions. 

Assumption 1: The non-convex penalty function (/>: R —>■ 
R satisfies the following 

1) (j) is continuous on R, twice differentiable on R\{0} and 
symmetric, i.e., (j){—x; a) = (j){x; a) 

2) > 0, Vx > 0 

3) (j)"{x) < 0,Vx > 0 

4) <^'(0+) = 1 

5) inf a) = (/'"(0+; a) = —a 

6) ^{x;0) = \x\. 

Since (/)(x;0) = \x\, the £i norm is recovered as a special 
case of the penalty function (p. The parameter a controls the 
degree of non-convexity of p. Note that the £p norm does not 
satisfy assumption 1. The rational penalty function [18], 


^(x;a) 


1 -f a|x|/2 ’ 


(4) 
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Fig. 1. The non-differentiable rational penalty function 0(x;a) and the 
function s(a;; a) = 4>{x; a) — |a:|, a = 0.4. 

the logarithmic, and the arctangent penalty functions [8], [31] 
are examples that satisfy Assumption 1. The rational penalty 
(j) for a = 0.4 is shown in Fig. 1. 

The proximity operator of ^ [12], prox^ : R —)■ R, is defined 
as 

prox.( 2 /; A,a) := argmin j l:{y - xf + A^(x; a) I . (5) 

^ Z J 

For a) satisfying Assumption 1, with a < 1/A, the prox¬ 
imity operator is a continuous non-linear threshold function 
with A as the threshold value, i.e., prox^(y; A, a) = 0,V|t/| < 

A. The proximity operator of the absolute value function 
is the soft-thresholding function. There is a constant gap 
between the identity function and the soft-threshold function 
due to which the non-zero values are underestimated [17]. 
On the other hand, non-convex penalty functions satisfying 
Assumption 1 are specifically designed so that the threshold 
function approaches identity asymptotically. These non-convex 
penalty functions do not underestimate large values. 

B. Convexity Condition 

In order to benefit from convex optimization principles in 
solving (3), we seek to ensure F in (3) is convex by controlling 
the parameter a^. For later, we note the following lemma. 

Lemma 1: Let (/: R —>■ R satisfy Assumption 1. The 
function s: R —R defined as 

s{x; a) := (l){x; a) — \x\, (6) 

is twice continuously differentiable and concave with 

—a < s”{x; a) < 0. (7) 

Proof: Since <j) and the absolute value function are twice 
continuously differentiable on R \ {0}, we need only show 
s'(0+) = s'(0~) and s"(0''") = s"(0”). From assumption 1, 
we have = 1, hence s'(0''') = (/'(O’*") — 1 = 0. Again 

by assumption 1 we have ^'(0“) = = —1, hence 

s'(0“) = ^^(0~) -1-1 = 0. Further, s"(0''') = ^"(O'*’) and 
s"(0“) = ^"(0“) = ^"(O''") = s"(0’*'). Thus the function s 
is twice continuously differentiable. The function s is concave 
since s”{x) = f'fx) ^ 0,Vx f 0. Using Assumption 1 it 
follows that —a ^ s"(a;;a) ^0. □ 

Figure 1 displays the function s{x; a), which is twice 
continuously differentiable even though the penalty function f 


is not differentiable. The following theorem states the critical 
value of parameter Oi to ensure the convexity of F in (3). 

Theorem 1: Let (j){x; a) be a non-convex penalty function 
satisfying Assumption 1 and A be a transform satisfying 
A^A = rl, r > 0. The function F : R" —> R defined in 
(3) is strictly convex if 

0 < a, < A. (8) 

T Ai 

Proof: Consider the function G : R" -5- R defined as 

. m 

GA) ■= 2 “ ^Il2 + a^). (9) 

i=l 

Since G is twice continuously differentiable (using Lemma 1), 
the Hessian of G is given by 

V^G(x) = / -I- A'^diag (Aidi,..., Xmdm) A, (10) 


where di = s" ([Axjj; at). Using (2), we write the Hessian as 
V^G(x) = • i Xmdm)^ A (11) 

= A^diag (—h Aidi,..., —h Xmdm^ A. (12) 


\r r J 

The transform A has full column rank, from (2), hence 
V^G(x) is positive definite if 


--f Aidi > 0, i = (13) 

r 

Thus, V^G(x) is positive definite if 

s"{[Ax\i-,ai) > —(14) 

T Xi 

Using Lemma 1, we obtain the critical value of Ui to ensure 
the convexity of G, i.e., 

<A. (15) 

T Xi 


It is straightforward that 


F{x) = G{x) + ^ Ai|[Ax]i|. (16) 

2=1 


Thus, being a sum of a strictly convex function and a convex 
function, F is strictly convex. □ 

Note that if Ui > l/{rXi), then the function G{x) is 
not convex, as the Hessian of G{x) is not positive definite. 
As a result, l/(rAi) is the critical value of ai to ensure 
the convexity of the function F. The following corollary 
provides a convexity condition for the situation where the same 
regularization parameter is applied to all coefficients. 

Corollary 1: For A^ = A, z = 1,..., m, the function F in 
(3) is strictly convex if 0 ^ Oj < l/(rA). □ 

We illustrate the convexity condition using a simple example 
with n = 2. We set 


A^ 


111 1 
11 - 1-1 ’ 


A^A = 47, 


(17) 


and Ai = A 2 = 1. Theorem 1 states that the function G defined 
in (9) is convex for oz ^ 1/4 and non-convex for Ui > 1/4. 






IEEE SIGNAL PROCESSING LETTERS. VOL. 22 NO. 10, OCTOBER, 2015. 


3 


4i(x; a), a = 0.25 



cKx; a), a = 0.3 


G(x),2. = 1,a = 0.25 



G(x), P. = 1, a = 0.3 



Fig. 2. Surface plots of the rational penalty function and the function G, for 
two different values of a. 


ALGORITHM I 

Iterative algorithm for the solution to (3). 


Input: y, Xi, r, ai, y 
Initialization: u = 0, d = 0 
Repeat: 

a; ^ — {y + - d)) 

1 + y,r ' ' 

u^ 5- prox^([Ax + d]i; \i/tJLi,ai) 

d d — {u — Ax) 

Until convergence 


It can be seen in Fig. 2 that the function G is convex for 
Gi = 0.25, even though the penalty function is not convex. 
However, when ai > 0.25, the function G (hence F) is non- 
convex. 


III. Algorithm 

A benefit of ensuring convexity of the objective function is 
that we can utilize convex optimization approaches to obtain 
the solution. In particular, for (j){x) — \x\, the widely used 
methods for solving (3) are proximal methods [12], [13] and 
ADMM [5], [19]. 

The convergence of ADMM to the optimum solution is 
guaranteed when the functions appearing in the objective 
function are convex [15]. The following theorem states that 
ADMM can be used to solve (3) with guaranteed convergence, 
provided the augmented Lagrangian parameter p is appropri¬ 
ately set. Such a condition on p was also given in [22]. Note 
that /i does not affect the solution to which ADMM converges, 
rather the speed at which it converges. 

Theorem 2: Let (p satisfy Assumption 1 and the transform 
A satisfy the Parseval frame condition (2). Let ai < l/{riXi). 
The iterative algorithm I converges to the global minimum of 
the function F in (3) if 



y> 1/r. 

(18) 

Proof: We 

re-write the problem (3) using variable split- 

[1] as 



arg min 

U^X 

S ^IIj/ - x\\l + ^ Xip [ui] ai) \ 

\ i—\ ) 

(19a) 

s.t. 

u = Ax. 

(19b) 


The minimization is separable in x and u. Applying ADMM 
to (19) yields the following iterative procedure with the 
augmented Lagrangian parameter /i. 


X t- argmin-^ -|| 2 / - x \\2 + ^\\u -Ax- d\\l 

X \ 2 2 


u argmin< Xip {ui; ai) + ^\\u — Ax — d\\ 
'll. 1 • ^ 9 


R(u) 


(20a) 

(20b) 


d d — {u — Ax) 


(20c) 


The sub-problem (20a) for x can be solved explicitly as 

X = [l + fxA'^A) ^ (^y + (u — d)) (21) 

= —{y + fiA^{u - d)) , (22) 

using (2). The sub-problem (20b) for u can be solved using 
prox^, provided the function R is convex. Consider the func¬ 
tion Q: R™ —> R defined as 

m 

Q{u) := Xis{ui] ai) + ^\\u - Ax - d\\l. (23) 

i=l 

From Lemma 1 and the proof of Theorem 1, V^Q{u) is 
positive definite if 

s''{ui-,ai) > -^ ^ y> GtXi. (24) 

Since Ui < l/{rXi), it follows that \/^Q{u) is positive definite 
if /i > 1 /r. Hence Q is strictly convex for y, > 1/r. Note that 
R{u) = (5(w)-l-||u||i. Hence, the function R, being the sum of 
a convex and a strictly convex function, is strictly convex. As 
such, the minimization problem in (20b) is well-defined and 
its solution can be efficiently computed using the proximity 
operator of (p (5), i.e., 

u^^pvox^(^[Ax + d]yXi/yi,aiy (25) 

Since A has full column rank, ADMM converges to a 
stationary point of the objective function (despite having a 
non-convex function in the objective) [24], [37]; see also [4], 
[20], [23]. Moreover, the function F is strictly convex (by 
Theorem 1) and the sub-problems of the ADMM are strictly 
convex for /i > 1/r. As a result, the iterative procedure (20) 
converges to the global minimum of F. □ 

A globally convergent algorithm based on a different split¬ 
ting is presented in [2]. In that approach, the objective function 
is split into two functions, both of which are convex regard¬ 
less of the auxiliary parameter value. Hence, no parameter 
constraint is required to ensure convergence. 


IV. Examples 
A. ID Signal Denoising 

We consider the problem of denoising a ID signal that 
is sparse with respect to the undecimated wavelet transform 
(UDWT) [11], which satisfies the condition (2) with r = 1. 
In particular, we use a 4-scale UDWT with three vanish¬ 
ing moments. The noisy signal is generated using Wave- 
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Time (n) 

Fig. 3. ID denoising example. Non-convex regularization yields lower RMSE 
than convex regularization. 



Fig. 4. RMSE values as a function of the noise level g for the ID signal 
denoising example. 


lab (http://www-stat.stanford.edu/%7Ewavelab/) with AWGN 
of a = 4.0. We set the regularization parameters Aj = 
/3cr2“'^/^,l ^ j ^ 4. We use the same \j for all the 
coefficients in scale j. The value of (3 is chosen to obtain 
the lowest RMSE for convex and non-convex regularization 
respectively. To maximally induce sparsity we set = l/A^. 
Eor the ID signal denoising example, we use the non-convex 
arctangent penalty and its corresponding threshold function 
[31]. Eor comparison we use reweighted minimization [8], 
with P chosen in order to obtain the lowest RMSE. 

Eigure 3 shows that the denoised signal obtained using non- 
convex regularization has the lowest RMSE and preserves 
the discontinuities. Eurther, the peaks are less attenuated 
using non-convex regularization in comparison with £i norm 
regularization. 

Eor further comparison, we generate the noisy signal in 
Eig. 3 for 1 ^ (j ^ 4, and denoise it with non-convex and 
convex regularization. We also denoise the noisy signal by 
direct non-linear thresholding of the noisy wavelet coefficients 
and by reweighted £i minimization. We use the same P values 
as in Eig. 3. The value of P for direct non-linear thresholding 
is also chosen to obtain the lowest RMSE. As seen in Eig. 4, 
the non-convex regularization outperforms the three methods 
by giving the lowest RMSE. The RMSE values are obtained 
by averaging over 15 realizations for each a. 



Wavelet artifacts are more prominent when using 


Fig. 5. Image denoising. 
fl norm regularization. 

(a) 



(b) 



Fig. 6. Relative performance of convex and non-convex regularization for 
image denoising. (a) PSNR as a function of A. (b) PSNR as a function of a. 


B. 2D Image Denoising 

We consider the problem of denoising a 2D image corrupted 
with AWGN. We use the 2D dual-tree complex wavelet 
transform (DT-CWT) [34], which is 4-times expansive and 
satisfies (2) with r = 1. The noisy ‘peppers’ image has peak 
signal-to-noise ratio (PSNR) value of 14.6 dB. We use the 
same A for all the sub-bands. As in the previous example, we 
set the value of A for each case (convex and non-convex) as 
a constant multiple of a that gives the highest PSNR. 

Eigure 5 shows that the denoised image (non-convex case) 
contains fewer wavelet artifacts and has a higher PSNR. 
Eigure. 6(a) shows the PSNR values (convex and non-convex) 
for different values of A. To further assess the performance of 
tight-frame non-convex regularization, we realize several noisy 
‘peppers’ images with 10 ^ cr ^ 100. As in the case of the ID 
signal denoising. Pig. 6 shows that non-convex regularization 
offers higher PSNR across different noise-levels. 

V. Conclusion 

This letter considers the problem of signal denoising using 
a sparse tight-frame analysis prior. We propose the use of 
parameterized non-convex regularizers to maximally induce 
sparsity while maintaining the convexity of the total problem. 
The convexity of the objective function is ensured by restrict¬ 
ing the parameter a of the non-convex regularizer. We use 
ADMM to obtain the solution to the convex objective function 
(consisting of a non-convex regularizer), and guarantee its 
convergence to the global optimum, provided the augmented 
Lagrangian parameter /r, satisfies /x > 1/r. The proposed 
method outperforms the £i norm regularization and reweighted 
£i minimization methods for signal denoising. 
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